Management of Xml Data by Means of Schema Matching

نویسنده

  • Gunter Saake
چکیده

The eXtensible Markup Language (XML) has emerged as a de facto standard to represent and exchange information among various applications on the Web and within organizations due to XML’s inherent data self-describing capability and flexibility of organizing data. As a result, the number of available (heterogeneous) XML data is rapidly increasing, and the need for developing high-performance techniques to manage these data is vastly growing. A first step to manage these data is to identify and discover semantic correspondences across XML data. The process of identifying semantic correspondences among heterogeneous XML data is called XML schema matching. Schema matching in general plays a central role in several shared XML data applications, such as XML data integration, XML data migration, XML data clustering, peer-to-peer systems, etc. Therefore, myriads of matching algorithms have been proposed and many matching systems have been developed. However, most of these systems produce score schema elements, which results in discovering simple (one-to-one) matches. Such results solve the schema matching problem partially. In order to completely solve the problem, the matching system should discover complex matches as well as simple ones. Another dimension of schema matching that should be considered is matching scalability. Existing matching systems rely heavily either on rule-based approaches or on learner-based approaches. Rule-based systems represent schemas to be matched in a common data model, such as schema trees or schema graphs. Then, they apply their algorithms to the common data model, which in turn requires traversing schema trees (schema graphs) many times. By contrast, learning-based systems need much pre-match effort to train their learners. As a consequence, especially in large-scale schemas and dynamic environments, matching efficiency declines radically. As an attempt to improve matching efficiency, recent schema matching systems have been developed. However, they only consider simple matching. Therefore, discovering complex matching taking into account schema matching scalability against both a large number of schemas and large-scale schemas is considered a real challenge. This thesis proposes a new matching approach, called sequence-based schema matching, to identify and discover both simple and complex matches in the large-scale XML schema context. The approach is based on exploiting the Prüfer encoding method that constructs a one-to-one correspondence between schema trees and sequences. As a result of sequence-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Management of XML data by means of schema matching

XML Schema Definition is a recommendation from World Wide Web Consortium that specifies the elements All, News, Get Started, Evaluate, Manage, Problem Solve Consider niche tech XQuery to bring improvements to data integration. Georg Gottlob MASTER THESIS Schema Matching and Automatic Web Data can mean any model, for instance, an XML schema, interface definition, semantic management it became a ...

متن کامل

An Improved Semantic Schema Matching Approach

Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...

متن کامل

Matching of Ontologies with XML Schemas Using a Generic Metamodel

Schema matching is the task of automatically computing correspondences between schema elements. A multitude of schema matching approaches exists for various scenarios using syntactic, semantic, or instance information. The schema matching problem is aggravated by the fact that models to be matched are often represented in different modeling languages, e.g. OWL, XML Schema, or SQL DDL. Consequen...

متن کامل

PLASMA: A Platform for Schema Matching and Management

This paper introduces an XML Schema management platform that promotes the use of matching techniques to fulfill the requirements of data integration and data exchange. The existing platforms, in the market, deal only with graphical but not automatic matching. Several matching algorithms were suggested, by different researchers, to automate the correspondences discovery between XML Schemas. Thes...

متن کامل

Semantic Web Technologies and Data Management

The Semantic Web aims to build a common framework that allows data to be shared and reused across applications, enterprises, and community boundaries. It proposes to use RDF as a flexible data model and use ontology to represent data semantics. Currently, relational models and XML tree models are widely used to represent structured and semi-structured data. But they offer limited means to captu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1973